Email Classification Using Machine Learning Algorithms
نویسنده
چکیده
Email has become one of the frequently used forms of communication. Everyone has at least one email account. Inflow of spam messages is a major problem faced by email users. Currently there are many spam filtering techniques. As the spam filtering techniques came up, spammers improved their methods of spamming. Thus, an effective spam filtering technique is the timely requirement. In this paper email classification is done using machine learning algorithms. Two of the important algorithms namely, Naïve Bayes and J48 Decision Tree are tested for their efficiency in classifying emails as spam or ham. The experiment focused on classification in combination with pre-processing techniques and concepts of text categorization. The dataset used is Enron Corpus. TF-IDF value is used as the weight score of text. The classifiers are also tested for different feature size. The test results show that J48 is more accurate in classifying emails as spam or ham with a minimum feature size and classification time. Keyword Spam, Machine learning algorithm, Naïve Bayes, J48 Decision Tree, Pre-processing, Enron, TFIDF, Feature size.
منابع مشابه
A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملBody Mass Index Classification based on Facial Features using Machine Learning Algorithms for utilizing in Telemedicine
Background and Objectives: Due to the impact of controlling BMI on life, BMI classification based on facial features can be used for developing Telemedicine systems and eliminating the limitations of measuring tools, especially for paralyzed people. So that physicians can help people online during the Covid-19 pandemic. Method: In this study, new features and some previous work features were e...
متن کاملComparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images
Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...
متن کاملTrust Classification in Social Networks Using Combined Machine Learning Algorithms and Fuzzy Logic
Social networks have become the main infrastructure of today’s daily activities of people during the last decade. In these networks, users interact with each other, share their interests on resources and present their opinions about these resources or spread their information. Since each user has a limited knowledge of other users and most of them are anonymous, the trust factor plays an import...
متن کاملActive Learning to Classify Email
While the technique of active learning has been applied successfully in improving text classification, its use in email classification has still not been explored. This paper examines several of the stateof-the-art algorithms for active learning with support vector machines as they are applied to email folder classification. We also introduce several extensions to these methods specifically des...
متن کامل